Analysing spatial data to address environmental issues


Adithi R. Upadhya

28 Nov 2022

Outline

  • Why and how we model air pollution
    • Data collection, algorithms, input data
  • Implementation in R
    • Shiny apps, reproducible codes
  • Implementation in Earth Engine and QGIS
    • Data retrieval and visualization

Project details

  • Set up an air quality sensor network and conducted 100+ days of mobile monitoring of air pollutants in urban Bangalore.
  • Build Land Use Regression (LUR) models which are statistical methods to estimate air pollution concentrations by relating the concentrations with several predictors, including land-use, traffic, satellite retrievals, etc.
  • Estimate air pollution for Bangalore at 50 m for the year 2021-2022 using ambient and mobile monitoring.

Study Area

Flow

flowchart LR
  A[Ambient] --> C(LUR variables)
  C --> D(training)
  D --> E(validation)
  E --> F(prediction monthly)

flowchart LR
  A[Mobile monitoring] --> C(LUR variables)
  C --> D(training)
  D --> E(validation)
  E --> F(prediction)

Data Collection

Data Correction

mmaqshiny

Near real time check

Aggregation

Earth Engine

Variables

  • Other variables (airport, industries)
  • Road length (with buffers)
  • Rail length (with buffers)
  • Population (with buffers)
  • Elevation
  • Normalized difference vegetation index (NDVI) (with buffers)
  • Night Time Light Intensity (NTLI)
  • Land Cover (with buffers)
  • Aerosol Optical Depth (AOD)
  • Meteorology
  • NO2

Predictors / Variables

Land Use Regression

Air pollutants concentration = f(land use, road length, meteorological variables)

Model example - Supervised Linear Regression

  • Eeftens, M., Beelen, R., De Hoogh, K., Bellander, T., Cesaroni, G., Cirach, M., … & Hoek, G. (2012). Development of land use regression models for PM2.5, PM2.5 absorbance, PM10 and PMcoarse in 20 European study areas; results of the ESCAPE project. Environmental Science & Technology, 46(20), 11195-11205. https://doi.org/10.1021/es301948k

Validation and Prediction

  • Model performance was evaluated by leave-one-out cross validation (LOOCV), where each site was sequentially left out from the model while the included variables were left unchanged.
  • For mobile monitoring data we conducted 10-fold cross validation by randomly splitting the road types.
  • Models with adjusted R2 more than equal to 0.3 were used to predict at 50 m for urban Bangalore area.

Mobile Monitoring (MM) models

Prediction map (MM) visualised using QGIS

Prediction map

Implementation

  • Using tidyverse and parallel processing
  • Using Geopackage instead of shapefile or Geojson
  • Using Postgres as the database
  • Using Earth Engine for high resolution processed data retrieval

Other models

  • Machine learning model:
    • Random forest
  • Geostatistical model:
    • Geographically weighted regression

Thank you!

Link adithiru.com
Twitter @AdithiUpadhya
GitHub @adithirgis

Land Use Regression Variables 1

Land Use Regression Variables 2